Comparative genomics of rainbow trout (Oncorhynchus mykiss): Is the genetic architecture of migratory behavior conserved among populations?

Abstract Rainbow trout (Oncorhynchus mykiss) are a partially migratory species wherein some individuals undergo long‐distance anadromous migrations, and others stay as residents in their native freshwater streams. The decision to migrate is known to be highly heritable, and yet, the underlying genes and alleles associated with migration are not fully characterized. Here we used a pooled approach of whole‐genome sequence data from migratory and resident trout of two native populations—Sashin Creek, Alaska and Little Sheep Creek, Oregon—to obtain a genome‐wide perspective of the genetic architecture of resident and migratory life history. We calculated estimates of genetic differentiation, genetic diversity, and selection between the two phenotypes to locate regions of interest and then compared these associations between populations. We identified numerous genes and alleles associated with life history development in the Sashin Creek population with a notable area on chromosome 8 that may play a critical role in the development of the migratory phenotype. However, very few alleles appeared to be associated with life history development in the Little Sheep Creek system, suggesting population‐specific genetic effects are likely important in the development of anadromy. Our results indicate that a migratory life history is not controlled by a singular gene or region but supports the idea that there are many independent ways for a migratory phenotype to emerge in a population. Therefore, conserving and promoting genetic diversity in migratory individuals is paramount to conserving these populations. Ultimately, our data add to a growing body of literature that suggests that population‐specific genetic effects, likely mediated through environmental variation, contribute to life history development in rainbow trout.


| INTRODUC TI ON
The movement of animals to exploit seasonally available resources plays an important role in the development, maintenance, and overall health of many different ecosystems. This process, known as migration, is characterized by an onset of physiological and behavioral changes and is regulated by a combination of genetic factors and environmental cues (Åkesson & Hedenström, 2007;Liedvogel et al., 2011). Although migration occurs in many taxa, some species demonstrate a partially migratory life history, consisting of individuals who undergo the migratory process and those that remain as nonmigratory residents (Chapman et al., 2011). Although both behavioral types often coexist within the same ecosystem, the differences between the two ecotypes can lead to ecological and adaptive differences that are exclusive to each life history (Chapman et al., 2011;Gómez-Bahamón et al., 2020;Winker, 2010).
Anthropogenic disturbances, such as climate change, dam construction, and habitat modifications, are disproportionately affecting migratory individuals compared to resident conspecifics. Thus, providing an increased need to understand the extent to which migratory individuals are able to adapt to these changing environments and the underlying molecular mechanisms associated with these adaptations (Liedvogel et al., 2011;Singh & Milner-Gulland, 2011;Wilcove & Wikelski, 2008).
Salmonids (salmon, trout, and charr) are an economically important group of fishes that are valued widely for both sportfishing and human consumption and play an important role in native subsistence fishing. One of the most abundant and well studied is the rainbow trout (Oncorhynchus mykiss), which is present in two distinct ecotypes; the rainbow (resident) and steelhead (migratory) trout. Heritability studies on O. mykiss have found strong evidence of an additive genetic component to migratory tendency (Hecht et al., 2015), and quantitative trait loci (QTL) studies have found loci on different linkage groups associated with multiple traits related to smoltification (the process by which resident individuals prepare for their anadromous migrations; Hecht et al., 2012;Le Bras et al., 2011;Nichols et al., 2008). More recently, genome-wide association studies (GWAS) approaches have found associations between polymorphic positions within the rainbow trout genome and life history development (Arostegui et al., 2019;Hale et al., 2013;Hecht et al., 2013;Weinstein et al., 2019), suggesting that migratory behavior is a polygenic trait. Additionally, an inversion on chromosome 5 has been implicated in contributing to the migratory condition (Campbell et al., 2021;Pearse et al., 2019), especially in coastal populations from California and southern Oregon (but see Arostegui et al., 2019).
However, many previous studies have used reduced-representation sequencing methods (i.e., RAD-seq and SNP chips) which leave large regions of the genome unqueried making it difficult to dissect the genetic architecture of this complex trait. Therefore, in order to accurately determine the genes and alleles associated with life history development, whole-genome sequence data are required.
Systematically studying the genetic basis of migration in O. mykiss is challenged by accurately identifying migrants and residents within populations. This study capitalizes on information collected on residency and anadromy in two comprehensively studied systems: Sashin Creek and Little Sheep Creek. The Sashin Creek system, located in Southeastern Alaska, has become a model population to evaluate the genetic differentiation between anadromous and resident rainbow trout. This site contains a resident population formed by transplanting juveniles of mixed (anadromous and resident) parentage above two barrier waterfalls into Sashin Lake in 1926 . These barriers allow juvenile smolts to leave the lake but prevent adult steelhead from returning to spawn, creating an allopatric resident population purged of most migratory alleles. Located below the barrier waterfalls in Sashin Creek is the ancestral population with access to the Pacific Ocean, which consists largely of migratory O. mykiss. Migratory and resident O. mykiss in this system have evolved broad genetic differences that appear to be associated with traits related to migration and smoltification (Hale et al., 2016;Hecht et al., 2015;McKinney et al., 2015;Weinstein et al., 2019). However, the previous studies in this system have used reduced-representation genome methods and have not provided detailed genome-wide data to assess the alleles and genes associated with migratory tendencies. Additionally, in-depth analyses regarding the associations between migratory behavior, physiology, and genetics have been largely limited to the Sashin Creek system, making it impossible to test if, and to what extent, alleles associated with migration in Sashin Creek are shared with other populations (although see Campbell et al., 2021;Hecht et al., 2013;Pearse et al., 2019). To that end, we also investigated the genetic basis of anadromy in the Little Sheep Creek population in Oregon that lacks physical barriers to gene flow allowing both ecotypes to reproduce in the same environment (Berntson et al., 2011).
Herein, we use whole-genome pooled-sequencing (pooled-seq) approaches to evaluate both the shared and population-specific genetic components of migratory and resident life histories among two geographically separated populations of O. mykiss (Sashin Creek, AK and Little Sheep Creek, OR). These data were analyzed to locate (1) regions of the genome associated with anadromy, (2) compare the locations of these regions between the two populations studied, and (3) link these polymorphisms within underlying protein-coding genes. The results contribute to an increasing body of information dedicated to understanding the shared and population-specific genetic control of migratory behavior in O. mykiss.

| Sampling and DNA extraction
Resident rainbow and anadromous steelhead trout were sampled from two populations: Sashin Creek, Alaska and Little Sheep Creek (LSC), Oregon. In the Sashin system, all residents were sampled from Sashin Lake and all sampled migrants were adults returning to spawn in Sashin Creek. In the LSC system, migrants and residents coexist, so accurate phenotype determination was critical in classifying life history type. The migratory phenotype was determined using physical traits including silvery coloration and body shape, whereas residents were determined by the expression of gametes (as sexual maturity precludes anadromy in rainbow trout; more sampling details can be found in Berntson et al., 2011;. To prevent phenotyping of precocious juveniles, only adult fish with fork lengths (tip of snout to the fork of the caudal fin) greater than 150 mm were sampled. To ensure that only wild fish, and not hatchery fish, were collected for sampling, only fish with the full adipose fin were collected. At sampling, the fork length and sex of the fish were recorded. In the Sashin Creek system, samples were collected in the form of fin clips, and LSC samples included both fin clips and operculum punches; all samples were preserved in 95% ethanol. Fish were immediately released after processing.

| Pooled-sequencing library preparation
DNA was extracted from 174 fish-40 Sashin Creek migrants, 40 Sashin Creek residents, 48 LSC migrants, and 46 LSC residents using DNeasy Tissue Extraction kits (QIAGEN Corporation). Pooledseq was used to provide a cost and time-efficient way of sequencing many individuals with a high depth of coverage. The utility of pooled-seq methods in nonmodel organisms has been discussed in detail in Micheletti and Narum (2018). Pooled-seq involves combining DNA from multiple samples of the same group into a singular pool which represents the population-level allele frequencies (Schlötterer et al., 2014). To ensure equal representation of each individual in the pool, every DNA sample was normalized to a standard of 300 μg/L before combining samples into pools.

| Pool-Seq alignment and filtering
Sequences were analyzed following the PoolParty pipeline (https:// github.com/Steve nMich elett i/poolp arty), as detailed in Micheletti and Narum (2018) with custom modifications. Sequences were first quality filtered to remove reads with poor quality bases (Q values <20) or that measured <50 base pairs in length. Quality-filtered sequences were then aligned to the rainbow trout genome (Omyk_1.0; GCA_002163495.1) using bwa mem with default settings ). Duplicate sequences were removed using samblaster (Faust & Hall, 2014), and samtools Li, 2011) was used to compile alignments and filter the reads to a minimum length of 36 and a quality score of >30 (Kofler et al., 2016). Alignments were then converted to the mpileup file format (in samtools), which summarizes total coverage information for each pool ) and then combined to a synchronized (sync) file to be used for analysis in Popoolation2 (Kofler, Pandey, & Schloetterer, 2011) and PoPoolations (Kofler, Orozco-terWengel, et al., 2011). Positions from each of the four pools were retained for use in statistical calculations if they fell between the minimum threshold of 25× depth of coverage, and the maximum threshold of 250× depth of coverage to reduce the chances of analyzing paralogs and sequences with low coverage. All candidate SNPs were required to have a minor allele frequency of at least 0.05 and each rare allele had to be present at least twice within each pool.  (Kofler, Orozco-terWengel, et al., 2011) using -min-count 2 and -min-qual 20. Using the same parameters as the Tajima's D statistic, Watterson's theta value was also calculated for each site using Popoolations to provide an estimate of the genetic diversity present between ecotypes from each location (Kofler, Orozco-terWengel, et al., 2011). Theta estimates were determined also using a nonoverlapping window size of 10,000 base pairs.

| RE SULTS
Aligning the pooled sequencing to the rainbow trout genome produced an average depth per position of 30 reads for Little Sheep Creek and 115 reads for Sashin pools. A total number of qualityfiltered reads as well as average coverage depth are reported for each population-SM, SR, LSM, and LSR in Table 1 Figure S1). There were 16 genome regions that showed a highindividual F ST in both Sashin and LSC that were further evaluated to detect if protein-coding genes present in those regions may be the subject of selection (Table 2). Of these, only one SNP, located on chromosome 25, was within a protein-coding region, found in RNA polymerase II-associated protein one (RPAP1).

TA B L E 1
Total number of quality-filtered (QF) reads for each of the four populations, and the average depth of coverage of each pool assuming the published rainbow trout genome size of 2.1 Gb. Tajima's D statistic was then calculated over adjacent 10,000 base pair windows across the genome of each of the four pools ( Figure 2) to determine the type of selection event occurring in each window. We limited Tajima's D values to those that were greater than or equal to 2 or less than or equal to negative 2, which suggest a significant deviation from neutrality occurred (Tajima, 1989). This threshold identified 112 significant regions in the LSR collection, of which 94 positions were negative. In the SR collection, we identified 24,192 regions using the same parameters, of which all but three positions were negative.
Between the two resident groups, 94 regions shared a significant value over the same window, and all were negative. In the migrant groups,   flow between ecotypes in LSC. Unsurprisingly, this, combined with a major population bottleneck that occurred when the Sashin Lake system was founded, has led to a higher degree of genetic differentiation between ecotypes in Sashin compared with LSC.

| DISCUSS ION
Although our findings suggest that the high degree of differentiation between life histories in Sashin Creek is due to genome-wide

F I G U R E 3 Watterson's theta values for (a) Sashin residents, (b) Sashin migrants, (c) Little Sheep Creek residents, and (d) Little Sheep
Creek migrants plotted in gray. Watterson's theta was estimated over a nonoverlapping 10 kb window. Gray points represent individual estimates of Watterson's theta within the 10 kb window. Blue lines represent the smoothed average theta values. There could be a connection between SNPs within lepr and migration as leptin modulates appetite and energy use (Gong et al., 2013;Murashita et al., 2008). Alterations in leptin receptor function or binding affinity between the receptor and leptin could be advantageous during periods with low food availability, and recent research indicates that leptin plays a role in migratory success (Choi et al., 2014;Fuentes et al., 2012). Leptin manages energy metabolism and body weight, both of which have been shown to be associated with migratory behavior through QTL studies (Hecht et al., 2012;Nichols et al., 2008). Although no SNPs were found in exon regions, these SNPs could be connected to differences in gene expression in lepr. Although previous RNA-seq experiments using samples from Sashin Creek did not find lepr to be differentially expressed (Hale et al., 2016;McKinney et al., 2015), these studies sampled juveniles and so patterns of gene expression of lepr in adult trout have, to the best of our knowledge, not been measured.
Although most evidence points toward population-specific ge-  Table 2), and four of these were located within known genes.
These genes are involved in a wide range of functions, including metabolism and protein synthesis, which could have downstream effects on smoltification or regulation of life history development (McCormick, 2013;Stefansson et al., 2012). Specifically, the gene PI-PLC (phosphatidylinositol-specific phospholipase C, X domain containing 1 (chr1; bp 13331)) was found to segregate based on phenotypes in both LSC and Sashin. This gene encodes for phospholipase C involved in hydrolyzing phospholipids into fatty acids and other lipophilic molecules (reviewed by Fukami, 2002). Evidence for the connection between metabolism and migration is vast, and it has been suggested that standard metabolic rate may be a required threshold for smoltification in O. mykiss, as it is associated with dominance, food acquisition, and energy usage (Sloat & Reeves, 2014).
Smolts alter their metabolic state in preparation for movement into seawater (McCormick, 2013;Stefansson et al., 2012), which is generally accompanied by a decrease in fatty stores, causing a change in condition factor that occurs in migratory salmonids before they leave freshwater (McCormick & Saunders, 1987;Sheridan, 1989). Therefore, polymorphisms within or nearby genes involved in metabolism might be under differential selection between O. mykiss ecotypes.

| Nucleotide diversity in Sashin and Little Sheep Creek
Overall measurements of nucleotide diversity show strong departures from neutrality for the Sashin Lake residents which may be indicative of underlying purifying selection .
These patterns were not mirrored in LSC which showed Tajima Prior studies at Sashin have not suggested that this region of the genome is strongly associated with anadromy, but QTL studies in other populations have identified smoltification-related traits that are localized to chromosome 8, including body morphology and increased growth and weight during the spring smoltification period (Nichols et al., 2008). This region also contained at least one gene with a known connection to photoperiod (s-antigen visual arrestin; sag) and genes involved in photoperiod have been shown to be differentially expressed between life histories in Sashin Creek (Hale et al., 2016;McKinney et al., 2015). The Sashin Creek system is higher in latitude than LSC and therefore sensitivity to photoperiod-and a need for rapid growth in the short Alaskan summer-may explain why selection operating in this region of the genome is stronger in Sashin than in LSC. Although it is currently unknown if genes connected to photoperiod are also differentially expressed in other populations of O.

mykiss.
A total of 10 genome regions were discovered that showed Tajima's D values less than −2 in both resident pools (Table 3).
Although none of the genes located within these regions had previously been found to be associated with life history development in O. mykiss, proteins produced by fibroblast growth factor 12 (fgf12) and histone deacetylase 7 (hdac7) are connected to two genes (fgfrl1 and hdac11) previously shown by Hale et al. (2016) to be differentially expressed between smolts and juvenile residents. These parallels could be indicative of the importance of cell signaling roles of fibroblast growth factors and potential protection from transcription factors by histone deacetylases. This could implicate epigenetic regulation or gene-by-environment interactions as significant contributing factors in the determination of life history type (Baerwald et al., 2016).
Regions of the genome suggesting convergent selection in migrants were identified by locating Tajima's D values greater than two or less than minus two in Sashin and LSC migrant pools (SM and LSM) that were not mirrored in residents (Table 4). As with the resident pools, all identified regions contained a negative Tajima's D which might be indicative of purifying selection (Tajima, 1989).
A total of nine genes were identified over eight 10 kb regions with roles varying from protein interactions and binding to ion-exchange capacity, again, supporting the conclusion that genes involved in life history development are many and varied. For example, the standard metabolic rate is often different between anadromous and resident salmonids (Sloat & Reeves, 2014), so the role of could be related to these adaptations and subsequent conversion of metabolic products. Several genes connected to ion binding and exchange were also found in regions of the genome with negative Tajima's D values (e.g., eps1511). This gene is especially interesting as ion exchange is crucial during smoltification (McCormick, 2013;Stefansson et al., 2012). To survive in freshwater, resident individuals are required to actively uptake ions (Na 2+ , Cl − , Ca 2+ ) to compensate for passive ion loss (McCormick, 2013;Stefansson et al., 2012).
However, in saltwater, this mechanism is reversed, and individuals are required to actively dispel those same ions to remain in homeostasis. Therefore, any mechanism that improves the function or tran- in this region of the genome with the data generated herein is unsurprising as previous studies failed to find the ancestral (or anadromous) form of the inversion in both Sashin and LSC (Hale et al., 2013;Weinstein et al., 2019). Taken together, studies strongly suggest that life history control in rainbow trout appears to be determined by population-specific genetic effects (e.g., Hale et al., 2013;Hecht et al., 2013;Nichols et al., 2008)  (lead); funding acquisition (lead); methodology (supporting); project administration (lead); supervision (supporting); writing -original draft (supporting); writing -review and editing (supporting).

ACK N OWLED G M ENTS
The authors would like to thank Marlo Jeffries, Evan Barfuss, Devon Pearse, and two anonymous reviewers for their comments and feedback on previous drafts of the manuscript. Funding for this research was provided from grants received through Texas Christian University, including the SERC and RCAF.

FU N D I N G I N FO R M ATI O N
Funding for this research was provided by two TCU grants awarded to Matthew Hale.

CO N FLI C T O F I NTER E S T S TATEM ENT
The authors declare that they do not have any conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Sequence data generated for this study have been deposited in the National Center for Biotechnology Information Sequence Read Archive under BioProject PRJNA978528.